Telecommunications voice server leveraging application web-server capabilities

ABSTRACT

A method for providing voice telephony services can include the step of receiving a call via a telephone gateway. The telephone gateway can convey call identifying data to a resource connector. A media port can be responsively established within a media converter that is communicatively linked to the telephone gateway through a port associated with the call. A call description object can be constructed that includes the call identifying data and an identifier for the media port. The call description object can be conveyed to a telephony application server that provides at least one speech service for the call. The telephony application server can initiate at least one programmatic action of a communicatively linked speech engine. The speech engine can convey results of the programmatic action to the media converter through the media port. The media converter can stream speech signals for the call based upon the results.

BACKGROUND

1. Field of the Invention

The present invention relates to the field of telecommunications and,more particularly, to a telecommunications voice server.

2. Description of the Related Art

The information age has heralded advancements in data accessibility thatalter the manner in which people interact socially and economically.Information tools like personal data assistances (PDA's), Web-enabledmobile telephones, computers, vehicle navigation systems, and the likecan immerse users in pools of information designed to avoidinconveniences and to generally ease hardships inherent in hecticlifestyles. For example, information tools can help users avoid traffic,maintain social contacts, receive important business email when awayfrom the office, and the like. A key component of the technologicalinfrastructure providing these capabilities includes voice serversystems which provide a multitude of speech services, like automaticspeech recognition services, synthetic speech generation services,transcription services, language and idiom translation services, and thelike.

Implementing robust voice servers in an extensible, cost efficientmanner has been a daunting challenge to service providers. Speechtechnologies are constantly changing and can require vast hardware andsoftware resources. For example, natural sounding speech generation iscommonly performed by concatentative text-to-speech (CTTS) engines, eventhough hundreds of megabytes of information can be required for storingthe phonemes associated with a single CTTS voice, and even thoughsignificant processing resources can be involved in constructingsynthetic speech from these phonemes. Providing other speech servicesprovides similar challenges. For example, natural languageinterpretation within ASR engines can require vast neural networks tointerpret speech input with reasonable accuracy.

As if these complexities were not enough, telecommunication protocols,call management services, and telephony features must be managed by avoice server that provides speech services for telephony communications.That is, conventional voice server systems include call sessionmanagement features, remote access capabilities, lifecycle management,load distribution, and other telephony related features that aretypically handled internally for performance reasons. Performance of avoice server can be significant because voice services are oftenrequired for real time and near-real time tasks making appreciableprocessing delays problematic. It would be highly advantageous, iftelecommunication related features of existing telecommunicationapplication servers could be leveraged by voice server systems so thatthese features need not be separately implemented within voice serversystems.

SUMMARY OF THE INVENTION

The present invention provides a complete telecommunications voiceserver that provides telephony, speech processing, and applicationservices via a standard language using a service browser. The voiceserver described herein leverages capabilities that exist within anapplication server, such as the Websphere Application Server (WAS) fromInternational Business Machines, Inc. of Armonk, N.Y. Notably, thepresent invention utilizes an existing product base when implementingthe voice server minimizes code development and maintenance cost whilemaximizing functionality.

For example, when integrated with the WAS, the disclosed voice serverneed not separately implement session management, lifecycle management,remote access, error tracking, pooling, and similar functionality. Sinceeach of these capabilities have been optimized for the WAS, the runtimeperformance and resource efficiency of the disclosed voice server ishigh. Moreover when integrated with the WAS, the disclosed voice serversystem can utilize software objects and libraries, such as Java (TM) 2Platform, Enterprise Edition (J2EE), developed for and utilized by theWAS, further minimizing development and maintenance costs, whileproviding a platform independent, scalable, and extensible solution.

One aspect of the present invention can include a method for providingvoice telephony services. The method can include the step of receiving acall via a telephone gateway. The telephone gateway can convey callidentifying data to a resource connector. A media port can beresponsively established within a media converter that iscommunicatively linked to the telephone gateway through a portassociated with the call. A call description object (CDO) can beconstructed that includes the call identifying data and an identifierfor the media port. The CDO can be conveyed to a telephony applicationserver that provides at least one speech service for the call. Thetelephony application server can initiate at least one programmaticaction of a communicatively linked speech engine. The speech engine canconvey results of the programmatic action to the media converter throughthe media port. The media converter can stream speech signals for thecall based upon the results.

Another aspect of the present invention can include a telephony systemwith speech capabilities. The system can include a telephony gateway, atelephone application server, a resource connector, and a mediaconverter. The telephony gateway can be communicatively linked to atelephone network, such as a public switched telephone network (PSTN).The telephone application server can provide at least one speechservice. In one embodiment, the telephone application server can includea WAS. The resource connector can be a communication intermediarybetween the telephone gateway and the telephone application server,where call information can be gathered by the resource connector andconveyed to the telephone application server. The media converter can bea communication intermediary between the telephone gateway and theapplication server, where speech signals can be streamed between thetelephone gateway and at least one speech engine.

BRIEF DESCRIPTION OF THE DRAWINGS

There are shown in the drawings, embodiments that are presentlypreferred; it being understood, however, that the invention is notlimited to the precise arrangements and instrumentalities shown.

FIG. 1 is a schematic diagram illustrating a telecommunicationapplication server providing speech services in accordance with theinventive arrangements disclosed herein.

FIG. 2 is a flow chart illustrating a method for implementingtelecommunication speech services in accordance with the inventivearrangements disclosed herein.

DETAILED DESCRIPTION OF THE INVENTION

FIG. 1 is a schematic diagram illustrating a telephony system 100 inaccordance with the inventive arrangements disclosed herein. The system100 can include a telephone gateway 115, a telephony application server150, and a multitude of speech engines 130. The telephone gateway 115can include hardware or software that translates protocols and/or routescalls between a telephone network 110 and the application sever 150. Forexample, the telephone gateway 115 can include a Cisco 2600 seriesrouter from Cisco Systems, Inc. of San Jose, Calif., a Cisco, a Cisco5300 series gateway, a Digital Trunk eXtended Adapter (DTXA), a Intel(R) Dialogic (R) Adaptor from Intel Corporation of Santa Clara, Calif.,and the like. The speech engines 130 can include one or more automaticspeech recognition engines 134, one or more text to speech engines 132,and other speech related engines and/or services.

The application server 150 can include an engine that functions as areliable foundation for handling high volume secure transactions and Webservices. In one embodiment, the application server 150 can be aWebsphere Application Server (WAS).

The application server 150 can also include a multitude of componentservers, such as telephone server 160, dialogue server 170, and speechserver 180, communicatively linked via a multitude of Web servers 152.Each Web server 152 can include one or more plug-ins 154, where eachplug-in 154 can include routines for conveying data to particularcomponent servers within the application server 150. Each of thecomponent servers of the application server 150 can be implemented asVirtual Machines, such as virtual machines adhering to the Java 2Enterprise Edition (J2EE) specification.

In one arrangement, component servers of the application server 150 canalso be distributed across a network. In such an arrangement, data canbe conveyed to each of the component servers via the Web servers 152.The Web servers 152 can utilize Hypertext Transfer Protocol Format(HTTP) for compatibility with IP sprayers and firewalls. The invention,however, is not limited in this regard and other data conveyanceprotocols can be used. For example, file transfer protocol (FTP) can beused to convey data between component servers.

The component servers within the application server 150 can include atelephone server 160, a dialogue server 170, and a speech server 180.The telephone server 160 can control the setup, monitoring, and teardown of phone calls. In one arrangement, telephone server 160 caninclude a web container 162 and an Enterprise Java Beans (EJB) container164. Moreover, the telephone server 160 can include a call controlservlet (servlet A), a call control EJB (Bean B), and a call controlinterpreter EJB (Bean C).

The dialogue server 170 can manage tasks relating to call dialogue forthe application server 150. In one arrangement, the dialogue server 170can include web container 172 and EJB container 174. Moreover, thedialogue server 170 can include a voice markup interpreter EJB (Bean D).

The speech server 180 can handle one or more speech services for theapplication server 150. In one arrangement, the speech server 180 caninclude web container 182 and EJB container 184. Moreover, the speechserver 180 can include an automatic speech recognition (ASR) EJB (BeanE) as well as a text-to-speech EJB (Bean F). Bean E and Bean F can beinterface components, each of which interfaces with an identified speechengine 130. It should be appreciated by one of ordinary skill in the artthat the telephone server 160, the dialogue server 170, and the speechserver 180 can be arranged in a multitude of fashions and that theinvention is not to be limited to the illustrative arrangement presentedherein.

The system 100 can also include a media converter 125 and a resourceconnector 120. The media converter 125 can be a communicationintermediary for streaming speech data configured to resolve protocolissues between the telephone gateway 115 and speech engines 130. Speechdata can be streamed bi-directionally between the telephone gateway 115and the speech engines 130 as appropriate.

The resource connector 120 can be a communication intermediary betweenthe telephone gateway 115 and the application server 150 and/or mediaconverter 125 that allocates resources for calls. In one embodiment, theresource connector 120 can normalize a telephony request into a requestthat is acceptable by the application server 150, thereby providing ageneric means for the telephone gateway 115 to interface with theapplication server 150. For example, if the application server 150communicates using HyperText Transfer Protocol (HTTP) messages, theresource connector 120 can convert a telephony request into anappropriate HTTP message. In another example, if the application server150 utilizes a Session Initiation Protocol (SIP), the resource connector120 can convert a telephony request into an appropriate SIP message.

The system 100 can further include one or more remote servers 140. Eachremote server 140 can perform programmatic actions requiring functionsinherent to the application server 150 using software interfaces G andH, which can be Java software objects. Software interfaces G and H canexpose otherwise private functions and parameters to remote processes.For example, the software interfaces G and H can permit server 140 toaccess data objects within dynamic cache service, such as the dynacacheincluded within WAS. In one embodiment, the software interface G caninclude a call control markup language and a voice markup language cacheservlet. In another embodiment, the software interface H can include agrammar cache servlet. It should be noted that the software interfaces Gand H can be disposed throughout the application server and need not bearranged as illustrated within FIG. 1. That is, each of the telephoneserver 160, the dialogue server 170, the speech server 180 can includeone or more of the software interfaces G and/or H.

In operation, a user 105 can initiate a telephone call. The call can beconveyed through a telephone network 110, such as a Public SwitchedTelephone Network (PSTN), and can be received by the telephone gateway115. The telephone gateway 115 can convey call information to theresource connector 120. For example, call information can be conveyedusing a session initiation protocol (SIP). In particular embodiments,the telephone gateway 115 can also convert circuit-switched data topacket-switched data for processing by the resource connector 120, mediaconverter 125, and application server 150. The resource connector 120can generate a call descriptor object (CDO) that contains call relatedinformation, including the port(s) that telephone gateway 115 hasassigned to the call. In one embodiment, the CDO can be a Java objectand the assigned port(s) can include Reliable Data Protocol (RPT)port(s).

Once generated, the CDO can be sent to the media converter 125, whichcan establish one or more media ports that can be used for the call.Identifiers, which can be Uniform Resource Identifiers (URI), associatedwith the reserved media ports can be added to the CDO. The CDO can thenbe conveyed to various component servers within the application server150 as needed, including the telephone server 160, the dialogue server170, and the speech server 180. As the CDO is conveyed through componentservers, additional information can be included within it. For example,a URI for a call control component within the telephone server 160 canbe included within the CDO as the CDO is conveyed through the telephoneserver 160. Speech services can be triggered within the applicationserver 150 as appropriate and provided for the call via the mediaconverter 125 as needed in accordance with the details of the call beinghandled by the application server 150.

FIG. 2 is a method 200 for implementing telecommunication speechservices in accordance with the inventive arrangements disclosed herein.The method 200 can be performed in the context of a telecommunicationsapplication server, such as a WAS. The method 200 can begin in step 205,where various telephony application server components can be initializedas appropriate. For example, a servlet within the telephony servercomponent can activate a resource container that functions as acommunication intermediary between a telephony gateway and the telephonyapplication server. In another example, a speech server component canallocate a multitude of component interface objects, each interfaceobject being associated with a particular speech engine. For example, apool of stateless EJBs can be allocated within the speech server, eachconfigured as an interface for a speech engine.

In step 210, a telephony gateway can receive a telephone call thatrequires at least one speech service. In step 215, call identifyinginformation can be conveyed by telephone gateway to a resourceconnector. Call identifying information can include a call identifier, acaller telephone number, a called telephone number, a gateway portassociated with the call, and so forth. In step 220, the resourceconnector can initiate a media converter for the call, where the mediaconverter can serve as a communication intermediary between a speechengine and the telephone gateway. In step 225, the media converter canestablish a connection with the calling port of the gateway and canestablishes at least one media port for receiving speech data. In step230, the media converter can convey identification information for theestablished media ports to the resource connector.

In step 235, the resource connector can generate a CDO that includescall identification and media port identification data. In step 240, theresource connector can convey the CDO to the telephone server componentof the telephony application server.

In step 245, the telephone server can fetch a call control document anda voice markup document, where the call control document can be a CallControl Extensible Markup Language (CCXML) document and the voice markupdocument can be a Voice Extensible Markup Language (VoiceXML) document.Identifiers for these documents and/or interpreters for these documentscan be added to the CDO. In step 250, the telephone server can managecall control functions by interpreting the call control document. Instep 255, the voice markup document and the CDO can be conveyed to adialogue server component of the telephony application server.

In step 260, the dialogue server can parse the voice markup documentinto a plurality of work segments called turns. In step 265, the CDO anda turn can be conveyed to a speech server component of the telephonyapplication server. In step 270, the speech server can determine aspeech engine to handle the turn. In step 275, the speech server can usethe media port identifier within the CDO to link the selected speechengine to the identified media port of the media converter. In step 280,the speech engine can perform at least one programmatic action resultingin a speech output that can be conveyed to the media converter. In step285, the media converter can stream synthetically generated speech to acalling party via the telephone gateway. The media converter can alsoreceive speech from the calling party and convey it to an appropriatespeech engine. In step 290, if there are more turns to process, themethod can loop back to step 265, where the next turn can be conveyed tothe speech server for processing.

If no more turns exist in step 290, the method can proceed to step 295,where the telephony application server can check to see if there are anymore calls that need managed. If so, the method can branch to step 210,where the new telephone call can be received. Otherwise, the method canfinish.

The present invention can be realized in hardware, software, or acombination of hardware and software. The present invention can berealized in a centralized fashion in one computer system or in adistributed fashion where different elements are spread across severalinterconnected computer systems. Any kind of computer system or otherapparatus adapted for carrying out the methods described herein issuited. A typical combination of hardware and software can be ageneral-purpose computer system with a computer program that, when beingloaded and executed, controls the computer system such that it carriesout the methods described herein.

The present invention also can be embedded in a computer programproduct, which comprises all the features enabling the implementation ofthe methods described herein, and which when loaded in a computer systemis able to carry out these methods. Computer program in the presentcontext means any expression, in any language, code or notation, of aset of instructions intended to cause a system having an informationprocessing capability to perform a particular function either directlyor after either or both of the following: a) conversion to anotherlanguage, code or notation; b) reproduction in a different materialform.

This invention can be embodied in other forms without departing from thespirit or essential attributes thereof. Accordingly, reference should bemade to the following claims, rather than to the foregoingspecification, as indicating the scope of the invention.

1. A method for providing voice telephony services comprising the stepsof: receiving a call via a telephone gateway; said telephone gatewayconveying call identifying data to a resource connector; responsivelyestablishing at least one media port within a media converter that iscommunicatively linked to the telephone gateway through a portassociated with the call; constructing a call description object thatincludes said call identifying data and an identifier for said at leastone media port; conveying said call description object to a telephonyapplication server that provides at least one speech service for saidcall; said telephony application server initiating at least oneprogrammatic action of a communicatively linked speech engine; saidspeech engine conveying results of said programmatic action to saidmedia converter through said at least one media port; and said mediaconverter streaming speech signals for said call based upon saidresults.
 2. The method of claim 1, wherein said telephone applicationserver comprises a telephone server, said method further comprising thesteps of: conveying said call description object to said telephoneserver; said telephone server retrieving a call control document and avoice markup document associated with said call; and, said telephoneserver interpreting said call control document to manage at least oneaspect of said call.
 3. The method of claim 1, wherein said telephoneapplication server comprises a dialogue server, said method furthercomprising the steps of: conveying said call description object and avoice markup document associated with said call and said calldescription object to said dialogue server; and said dialogue serverparsing said voice markup document into a plurality of units of work,wherein each unit of work is capable of being independently processed,and wherein said results of said speech engine are results for one ofsaid units of work.
 4. The method of claim 1, wherein said telephoneapplication server comprises a speech server, said method furthercomprising the steps of: said speech server receiving said calldescription object and a voice markup document associated with saidcall; and said speech server initiating said programmatic action of saidspeech engine responsive to said speech server interpreting said voicemarkup document, wherein said speech server conveys said media portidentifier to said speech engine.
 5. The method of claim 1, wherein saidtelephone application server comprises a speech server, said methodfurther comprising the steps of: instantiating a plurality ofplatform-independent component software objects within said speechserver, each software object providing an interface between said speechserver and a software engine, wherein said initiating of saidprogrammatic action utilizes one of said component software objects. 6.A machine-readable storage having stored thereon, a computer programhaving a plurality of code sections, said code sections executable by amachine for causing the machine to perform the steps of: receiving acall via a telephone gateway; said telephone gateway conveying callidentifying data to a resource connector; responsively establishing atleast one media port within a media converter that is communicativelylinked to the telephone gateway through a port associated with the call;constructing a call description object that includes said callidentifying data and an identifier for said at least one media port;conveying said call description object to a telephony application serverthat provides at least one speech service for said call; said telephonyapplication server initiating at least one programmatic action of acommunicatively linked speech engine; said speech engine conveyingresults of said programmatic action to said media converter through saidat least one media port; and said media converter streaming speechsignals for said call based upon said results.
 7. The machine-readablestorage of claim 6, wherein said telephone application server comprisesa telephone server, said machine-readable storage further comprising thesteps of: conveying said call description object to said telephoneserver; said telephone server retrieving a call control document and avoice markup document associated with said call; and, said telephoneserver interpreting said call control document to manage at least oneaspect of said call.
 8. The machine-readable storage of claim 6, whereinsaid telephone application server comprises a dialogue server, saidmachine-readable storage further comprising the steps of: conveying saidcall description object and a voice markup document associated with saidcall and said call description object to said dialogue server; and saiddialogue server parsing said voice markup document into a plurality ofunits of work, wherein each unit of work is capable of beingindependently processed, and wherein said results of said speech engineare results for one of said units of work.
 9. The machine-readablestorage of claim 6, wherein said telephone application server comprisesa speech server, said machine-readable storage further comprising thesteps of: said speech server receiving said call description object anda voice markup document associated with said call; and said speechserver initiating said programmatic action of said speech engineresponsive to said speech server interpreting said voice markupdocument, wherein said speech server conveys said media port identifierto said speech engine.
 10. The machine-readable storage of claim 6,wherein said telephone application server comprises a speech server,said machine-readable storage further comprising the steps of:instantiating a plurality of platform-independent component softwareobjects within said speech server, each software object providing aninterface between said speech server and a software engine, wherein saidinitiating of said programmatic action utilizes one of said componentsoftware objects.
 11. A telephony system with speech capabilitiescomprising: a telephony gateway communicatively linked to a telephonenetwork; a telephone application server including a plurality of virtualmachines, said virtual machines including a telephone serve, a dialogueserver, and a speech server; a resource connector that is acommunication intermediary between said telephone gateway and saidtelephone application server, wherein call information is gathered bysaid resource locator and conveyed to the telephone application server;and a media converter that is a communication intermediary between saidtelephone gateway and said application server, wherein speech signalsare streamed between the telephone gateway and at least one speechengine.
 12. The system of claim 11, wherein said telephone server isconfigured to fetch a call control markup document and a voice markupdocument for a call managed by the telephone application server.
 13. Thesystem of claim 11, wherein said dialogue server is configured to parsea voice markup document associated with a telephone call into aplurality of work units and to manage said work units for said telephonecall.
 14. The system of claim 13, wherein said speech server configuredto provide said at least one speech service responsive to one of saidwork units conveyed to said speech server from said dialogue server. 15.The system of claim 14, wherein at least one of said speech engines isremotely located from said telephone application server, said speechserver comprising: at least one stateless, platform-independent,software interface component, each software interface component beingassociated with a particular one of said remotely located speechengines.
 16. The system of claim 11, said media converter comprising: atleast one media port for streaming speech signals, wherein said mediaport is established a time before an immediate need to stream speechsignals has been identified.
 17. The system of claim 11, said systemfurther comprising: a call description object containing said callinformation, wherein said call description object is conveyed betweensaid resource connector and said telephone application server andbetween virtual machines within said telephone application server. 18.The system of claim 17, said media converter comprising: at least onemedia port established responsive to a call being received by saidtelephone gateway, wherein an identifier for said media port is includedwithin said call description object.
 19. The system of claim 18, saidtelephone application server further comprising: at least one web serverused to convey said call description object to at least one of saidvirtual machines.
 20. The system of claim 11, wherein said telephoneapplication server is a Websphere-type Application Server.
 21. Thesystem of claim 11, further comprising: means for normalizing atelephony request received by the telephony gateway into a request thatis acceptable within an environment of the telephone application server.22. A method for providing voice telephony services comprising the stepsof: receiving a call via a telephone gateway; packetizing callinformation into a plurality of messages normalized for an applicationserver environment; initializing at least one media socket within amedia converter, where said media socket is allocated for approximatelya duration of the call; conveying said messages to a telephoneapplication server that provides at least one speech service for saidcall; said telephony application server initiating at least oneprogrammatic action of a communicatively linked speech engine; saidspeech engine establishing a communication link with said converter viasaid media socket; and conveying audio signals between said speechengine and said telephone gateway using said media converter as acommunication intermediary, wherein once said speech engine hascompleted executing said at least one programmatic action, disconnectingsaid speech engine from said media socket so that said media socket isavailable for communications with other speech engines.
 23. The methodof claim 22, said packetizing step further comprising packetizing callinformation into a plurality of HyperText Transfer Protocol messages.24. The method of claim 22, said packetizing step further comprisingpacketizing call information into a plurality of Session InitiationProtocol messages.